Bayesian Approximate Kernel Regression with Variable Selection

نویسندگان

  • Lorin Crawford
  • Kris C. Wood
  • Xiang Zhou
  • Sayan Mukherjee
چکیده

Nonlinear kernel regression models are often used in statistics and machine learning due to greater accuracy than linear models. Variable selection for kernel regression models is a challenge partly because, unlike the linear regression setting, there is no clear concept of an effect size for regression coefficients. In this paper, we propose a novel framework that provides an analog of the effect size of each explanatory variable for Bayesian kernel regression models when the kernel is shift-invariant—for example the Gaussian kernel. We use function analytic properties of shift-invariant reproducing kernel Hilbert spaces (RKHS) to define a linear vector space that (1) captures nonlinear structure and (2) can be projected onto the original explanatory variables. The projection onto the original explanatory variables serves as the analog of effect sizes. The specific function analytic property we use is that shift-invariant kernel functions can be approximated via random Fourier bases. Based on the random Fourier expansion we propose a computationally efficient class of Bayesian approximate kernel regression (BAKR) models for both nonlinear regression and binary classification for which one can compute an analog of effect sizes. By adapting some classical results in compressive sensing we state conditions under which BAKR can recover a sparse set of effect sizes, simultaneous variable selection and regression. We illustrate the utility of BAKR by examining, in some detail, two important problems in statistical genetics: genomic selection (predicting phenotype from genotype) and association mapping (inference of significant variables or loci). State-of-the-art methods for genomic selection and association mapping are based on kernel regression and linear models, respectively. BAKR is the first method that is competitive in both settings. We will provide empirical evidence that BAKR performs as well or better than top methods for both genomic selection and association mapping. An observation relevant to genetics is that BAKR and nonlinear regression models tend to have greater advantage over linear models when the observed samples are related. ar X iv :1 50 8. 01 21 7v 2 [ st at .M E ] 2 1 A pr 2 01 6

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Markov chain Monte Carlo for Bayesian Variable Selection

We describe adaptive Markov chain Monte Carlo (MCMC) methods for sampling posterior distributions arising from Bayesian variable selection problems. Point mass mixture priors are commonly used in Bayesian variable selection problems in regression. However, for generalized linear and nonlinear models where the conditional densities cannot be obtained directly, the resulting mixture posterior may...

متن کامل

Scalable Bayesian Kernel Models with Variable Selection

Nonlinear kernels are used extensively in regression models in statistics and machine learning since they often improve predictive accuracy. Variable selection is a challenge in the context of kernel based regression models. In linear regression the concept of an effect size for the regression coefficients is very useful for variable selection. In this paper we provide an analog for the effect ...

متن کامل

Automatic Kernel Selection for Gaussian Processes Regression with Approximate Bayesian Computation and Sequential Monte Carlo

The current work introduces a novel combination of two Bayesian tools, Gaussian Processes (GPs), and the use of the Approximate Bayesian Computation (ABC) algorithm for kernel selection and parameter estimation for machine learning applications. The combined methodology that this research article proposes and investigates offers the possibility to use different metrics and summary statistics of...

متن کامل

Nonparametric Bayesian Kernel Models

Kernel models for classification and regression have emerged as widely applied tools in statistics and machine learning. We discuss a Bayesian framework and theory for kernel methods, providing a new rationalization of kernel regression based on nonparametric Bayesian models. Functional analytic results ensure that such a nonparametric prior specification induces a class of functions that span ...

متن کامل

Non-parametric Bayesian Kernel Models

1 SUMMARY Kernel models for classification and regression have emerged as widely applied tools in statistics and machine learning. We discuss a Bayesian framework and theory for kernel methods, providing a new rationalisation of kernel regression based on non-parametric Bayesian models. Functional analytic results ensure that such a non-parametric prior specification induces a class of function...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015